NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A Comparison of AI Weather Prediction and Numerical Weather Prediction Models for 1–7-Day Precipitation Forecasts

https://doi.org/10.1175/WAF-D-24-0081.1

Radford, Jacob T; Ebert-Uphoff, Imme; Stewart, Jebb Q (April 2025, Weather and Forecasting)

Abstract Pure artificial intelligence (AI)-based weather prediction (AIWP) models have made waves within the scientific community and the media, claiming superior performance to numerical weather prediction (NWP) models. However, these models often lack impactful output variables such as precipitation. One exception is Google DeepMind’s GraphCast model, which became the first mainstream AIWP model to predict precipitation, but performed only limited verification. We present an analysis of the ECMWF’s Integrated Forecasting System (IFS)-initialized (GRAP_IFS) and the NCEP’s Global Forecast System (GFS)-initialized (GRAP_GFS) GraphCast precipitation forecasts over the contiguous United States and compare to results from the GFS and IFS models using 1) grid-based, 2) neighborhood, and 3) object-oriented metrics verified against the fifth major global reanalysis produced by ECMWF (ERA5) and the NCEP/Environmental Modeling Center (EMC) stage IV precipitation analysis datasets. We affirmed that GRAP_GFSand GRAP_IFSperform better than the GFS and IFS in terms of root-mean-square error and stable equitable errors in probability space, but the GFS and IFS precipitation distributions more closely align with the ERA5 and stage IV distributions. Equitable threat score also generally favored GraphCast, particularly for lower accumulation thresholds. Fractions skill score for increasing neighborhood sizes shows greater gains for the GFS and IFS than GraphCast, suggesting the NWP models may have a better handle on intensity but struggle with the location. Object-based verification for GraphCast found positive area biases at low accumulation thresholds and large negative biases at high accumulation thresholds. GRAP_GFSsaw similar performance gains to GRAP_IFSwhen compared to their NWP counterparts, but initializing with the less familiar GFS conditions appeared to lead to an increase in light precipitation. Significance StatementPure artificial intelligence (AI)-based weather prediction (AIWP) has exploded in popularity with promises of better performance and faster run times than numerical weather prediction (NWP) models. However, less attention has been paid to their capability to predict impactful, sensible weather like precipitation, precipitation type, or specific meteorological features. We seek to address this gap by comparing precipitation forecast performance by an AI model called GraphCast to the Global Forecast System (GFS) and the Integrated Forecasting System (IFS) NWP models. While GraphCast does perform better on many verification metrics, it has some limitations for intense precipitation forecasts. In particular, it less frequently predicts intense precipitation events than the GFS or IFS. Overall, this article emphasizes the promise of AIWP while at the same time stresses the need for robust verification by domain experts.
more » « less
Free, publicly-accessible full text available April 1, 2026
Measuring Sharpness of AI-Generated Meteorological Imagery

https://doi.org/10.1175/AIES-D-24-0083.1

Ebert-Uphoff, Imme; Ver_Hoef, Lander; Schreck, John S; Stock, Jason; Molina, Maria J; McGovern, Amy; Yu, Michael; Petzke, Bill; Hilburn, Kyle; Hall, David M; et al (June 2025, Artificial Intelligence for the Earth Systems)

Abstract AI-based algorithms are emerging in many meteorological applications that produce imagery as output, including for global weather forecasting models. However, the imagery produced by AI algorithms, especially by convolutional neural networks (CNNs), is often described as too blurry to look realistic, partly because CNNs tend to represent uncertainty as blurriness. This blurriness can be undesirable since it might obscure important meteorological features. More complex AI models, such as Generative AI models, produce images that appear to be sharper. However, improved sharpness may come at the expense of a decline in other performance criteria, such as standard forecast verification metrics. To navigate any trade-off between sharpness and other performance metrics it is important to quantitatively assess those other metrics along with sharpness. While there is a rich set of forecast verification metrics available for meteorological images, none of them focus on sharpness. This paper seeks to fill this gap by 1) exploring a variety of sharpness metrics from other fields, 2) evaluating properties of these metrics, 3) proposing the new concept of Gaussian Blur Equivalence as a tool for their uniform interpretation, and 4) demonstrating their use for sample meteorological applications, including a CNN that emulates radar imagery from satellite imagery (GREMLIN) and an AI-based global weather forecasting model (GraphCast).
more » « less
Free, publicly-accessible full text available June 9, 2026
Accelerating Community-Wide Evaluation of AI Models for Global Weather Prediction by Facilitating Access to Model Output

https://doi.org/10.1175/BAMS-D-24-0057.1

Radford, Jacob T; Ebert-Uphoff, Imme; Stewart, Jebb Q; Musgrave, Kate D; DeMaria, Robert; Tourville, Natalie; Hilburn, Kyle (January 2025, Bulletin of the American Meteorological Society)

Abstract Numerous artificial intelligence-based weather prediction (AIWP) models have emerged over the past 2 years, mostly in the private sector. There is an urgent need to evaluate these models from a meteorological perspective, but access to the output of these models is limited. We detail two new resources to facilitate access to AIWP model output data in the hope of accelerating the investigation of AIWP models by the meteorological community. First, a 3-yr (and growing) reforecast archive beginning in October 2020 containing twice daily 10-day forecasts forFourCastNet v2-small,Pangu-Weather, andGraphCast Operationalis now available via an Amazon Simple Storage Service (S3) bucket through NOAA’s Open Data Dissemination (NODD) program (https://noaa-oar-mlwp-data.s3.amazonaws.com/index.html). This reforecast archive was initialized with both the NOAA’s Global Forecast System (GFS) and ECMWF’s Integrated Forecasting System (IFS) initial conditions in the hope that users can begin to perform the feature-based verification of impactful meteorological phenomena. Second, real-time output for these three models is visualized on our web page (https://aiweather.cira.colostate.edu) along with output from the GFS and the IFS. This allows users to easily compare output between each AIWP model and traditional, physics-based models with the goal of familiarizing users with the characteristics of AIWP models and determine whether the output aligns with expectations, is physically consistent and reasonable, and/or is trustworthy. We view these two efforts as a first step toward evaluating whether these new AIWP tools have a place in forecast operations.
more » « less
Full Text Available
Trust and trustworthy artificial intelligence: A research agenda for AI in the environmental sciences

https://doi.org/10.1111/risa.14245

Bostrom, Ann; Demuth, Julie L; Wirz, Christopher D; Cains, Mariana G; Schumacher, Andrea; Madlambayan, Deianna; Bansal, Akansha Singh; Bearth, Angela; Chase, Randy; Crosman, Katherine M; et al (June 2024, Risk Analysis)

Abstract Demands to manage the risks of artificial intelligence (AI) are growing. These demands and the government standards arising from them both call for trustworthy AI. In response, we adopt a convergent approach to review, evaluate, and synthesize research on the trust and trustworthiness of AI in the environmental sciences and propose a research agenda. Evidential and conceptual histories of research on trust and trustworthiness reveal persisting ambiguities and measurement shortcomings related to inconsistent attention to the contextual and social dependencies and dynamics of trust. Potentially underappreciated in the development of trustworthy AI for environmental sciences is the importance of engaging AI users and other stakeholders, which human–AI teaming perspectives on AI development similarly underscore. Co‐development strategies may also help reconcile efforts to develop performance‐based trustworthiness standards with dynamic and contextual notions of trust. We illustrate the importance of these themes with applied examples and show how insights from research on trust and the communication of risk and uncertainty can help advance the understanding of trust and trustworthiness of AI in the environmental sciences.
more » « less
Full Text Available

Search for: All records